Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comment category-scholar-!cn #675

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Comment category-scholar-!cn #675

wants to merge 2 commits into from

Conversation

IceCodeNew
Copy link
Collaborator

The following line should not be included by default, as it could ruin the out-of-box experience for the people who most likely need this category.

Fix #674

The following line should not be included by default, as it could ruin the out-of-box experience for the people who most likely need this category
@database64128
Copy link
Contributor

I don't think so. Most students actually use the Internet service from common ISPs "out of box". They most likely only switch to education networks on demand.

@database64128
Copy link
Contributor

This PR also breaks the convention of categorizing domains using the host entity's location. Traditionally we reduce the inconvenience from this by using attributes like cn. But in this case, the user can simply use category-scholar-!cn.

@moetayuko
Copy link
Contributor

I don't think so. Most students actually use the Internet service from common ISPs "out of box". They most likely only switch to education networks on demand.

It turns out that my institute has special routing rules for database sites because the accessing IP shown on such sites belongs to CERNET while the one shown on other sites belongs to China Telecom. Accessing paid databases is seamless and there're no manual switches as far as I'm aware.

However, this doesn't mean that I agree with this PR. category-scholar-!cn is a mixture of both open access (OA) and paid databases, e.g., aclweb.org, sci-hub and google-scholar are OA and should remain proxied for better connectivity, acm.org, ieee, elsevier are paid and thus should be removed from the list.

@IceCodeNew
Copy link
Collaborator Author

However, this doesn't mean that I agree with this PR. category-scholar-!cn is a mixture of both open access (OA) and paid databases, e.g., aclweb.org, sci-hub and google-scholar are OA and should remain proxied for better connectivity, acm.org, ieee, elsevier are paid and thus should be removed from the list.

Yes, we definitely should split the list and maintain OA scholarly sites and other sites which have to subscribe for access.
PR is welcome, as I merely have free time for this kind of work.

@database64128
Copy link
Contributor

It turns out that my institute has special routing rules for database sites because the accessing IP shown on such sites belongs to CERNET while the one shown on other sites belongs to China Telecom. Accessing paid databases is seamless and there're no manual switches as far as I'm aware.

acm.org, ieee, elsevier are paid and thus should be removed from the list.

You are fortunate enough to attend schools that have good education network access for everyday use and it comes with such benefits for students. Unfortunately, most universities either don't have reliable education network plans to choose from, or don't provide this kind of access. And don't forget about remote learning.

@moetayuko
Copy link
Contributor

Unfortunately, most universities either don't have reliable education network plans to choose from, or don't provide this kind of access.

What's the usecase of surfing paid databases w/o access to the academic contents?

And don't forget about remote learning.

Let's say I work from home with both university VPN for paid contents and v2ray for google scholar, v2ray often prioritizes over uni VPN that working as a gateway, so paid databases should still be whitelisted to allow forwarding to uni VPN.

@database64128
Copy link
Contributor

What's the usecase of surfing paid databases w/o access to the academic contents?

You can still view the abstract and other basic information if you have not subscribed or bought the paper.

Let's say I work from home with both university VPN for paid contents and v2ray for google scholar, v2ray often prioritizes over uni VPN that working as a gateway, so paid databases should still be whitelisted to allow forwarding to uni VPN.

You are already going out of your way to use the university VPN. What's so difficult with adding a simple rule to use direct connection for category-scholar-!cn? This project is a general-purpose domain list. Conventions and rules should not be bent to cater to some niche use cases like this.

@IceCodeNew
Copy link
Collaborator Author

IceCodeNew commented Oct 20, 2021

You can still view the abstract and other basic information if you have not subscribed or bought the paper.

You still can visit these sites without proxy, I'm sure most of them are not blocked.
IMO, putting these domains in geolocation-!cn will affect users whose organizations have subscripted to these sites.
On the other hand, excluding these domains from geolocation-!cn will not cause people in China unable to access these sites.
Also, the latter is less likely, less frequent to access these domains. So excluding scholarly sites from geolocation-!cn is sound to me.

After all, every commit that happened here is about to make some trade-offs. And we can not fit everyone's needs.
Which side seems to you the real majority here? Will you always give the same answer based on the amounts of people? Or should we take the likelihood and other factors into account here?

@database64128
Copy link
Contributor

You still can visit these sites without proxy, I'm sure most of them are not blocked.

Actually, the change in this PR doesn't affect my setup. These sites will always be connected via proxy in my setup.

Will you always give the same answer based on the amounts of people?

It's not about which side is the majority. If merged, this change will set a bad precedent: some non-CN sites are purposely excluded from geolocation-!cn. You are basically changing the definition of geolocation-!cn, which is bad and could be very damaging IMO.

@IceCodeNew
Copy link
Collaborator Author

You still can visit these sites without proxy, I'm sure most of them are not blocked.

Actually, the change in this PR doesn't affect my setup. These sites will always be connected via proxy in my setup.

Will you always give the same answer based on the amounts of people?

It's not about which side is the majority. If merged, this change will set a bad precedent: some non-CN sites are purposely excluded from geolocation-!cn. You are basically changing the definition of geolocation-!cn, which is bad and could be very damaging IMO.

We already have loads of discussions about following the definitions or similar topics. I will just skip them (Refer to #28 and others).
I would like to tell you how I developed my philosophy on maintaining this project recently, here is an example:
Where are you going to put the baijiayun and the duitang sites? (Refer to #672)
These sites are also "changing" the definition of the cdn definitions IMO. But I am OK putting the baijiayun under the cdn category.

There is no way you can category these sites precisely under the current project structure. And I gradually find out that pushing things too far from practicality is not going to serve any good.
To compensate for what we have traded-off for the preciseness, the feature that is to label out attr comes out. But it still does not fully function.
And even if you are OK with the part that has already been implemented and supported, there are a bunch of problems in utilizing this feature (Refer to #300 and other issues)

What is the point for all of these? I mean, let's re-evaluate the point for sticking to the exact category definition.
For reviewers, doing so would help us reach a consensus.
For users, the name of a category should tell them clearly how they are supposed to use this category.

Did I just against any point here in this very PR? I don't think so.
I'm not saying that we should DELETE the line for including overseas scholarly sites in geolocation-!cn, the line is still there, just been commented. This won't against any existing category rules.
For the user side, will, I had explained before. And seems you agree with me to some degree.

@IceCodeNew
Copy link
Collaborator Author

If merged

And BTW, this PR is not going to be merged. Not until the OA sites have been separate from the scholarly sites which require a subscription.

@IceCodeNew IceCodeNew marked this pull request as draft October 20, 2021 15:43
@database64128
Copy link
Contributor

And I gradually find out that pushing things too far from practicality is not going to serve any good.

I don't think removing a bunch of non-CN scholar sites from geolocation-!cn just because some campus networks have special optimizations for them is a "practical" move for the users of this project.

These sites are also "changing" the definition of the cdn definitions IMO.

Technical terms like CDN usually refer to general concepts and their uses and meanings can change overtime. Geolocation, on the other hand, is a clear indication of service location. This concept has been widely used on the Internet with the same meaning probably for decades. I believe the categorization of our geolocation sets should not take into account non-geological factors like whether some sites are subscription-based scholar sites.

@IceCodeNew
Copy link
Collaborator Author

IceCodeNew commented Oct 21, 2021

And I gradually find out that pushing things too far from practicality is not going to serve any good.

I don't think removing a bunch of non-CN scholar sites from geolocation-!cn just because some campus networks have special optimizations for them is a "practical" move for the users of this project.

These sites are also "changing" the definition of the cdn definitions IMO.

Technical terms like CDN usually refer to general concepts and their uses and meanings can change overtime. Geolocation, on the other hand, is a clear indication of service location. This concept has been widely used on the Internet with the same meaning probably for decades. I believe the categorization of our geolocation sets should not take into account non-geological factors like whether some sites are subscription-based scholar sites.

We are not making progress. Let me put it this way. Have we categorized all of the overseas sites here?

The work we have done is just a fraction of the active sites which geographically located out of China. How are we yet not overwhelmed by issues complaining about it?
The long tail effect describes the nature of this problem. Turns out that it does not matter we failed to include the sites that do not have much UV.

So can we just pretend that we have never included sites like IEEE or Elsevier that most functions will need a subscription? To what extend would you except some random user come here and submit an issue complaining about having access to an overseas scholarly site that turns out been blocked by Chinese GFW?

And even if there are users been affected by the move I proposed here. They can easily solve the problem by including the category named, well, category-scholar-need-subscription or something. On the other hand, your opinion did not solve the existing problem.

Excluding domains in routing configuration is way more difficult than including domains. If we can prevent this from the beginning, why not? This will not leave another issue that can not be solved anyway.

@ysakura99
Copy link
Contributor

My uni doesn't provide CERNET access to dorms and it is unrealistic to assume one can always have CERNET access even as a uni student or academia. I think for those who need to direct out scholar-!cn, it shouldn't be too hard to add one rule by themselves.

@moetayuko
Copy link
Contributor

My uni doesn't provide CERNET access to dorms and it is unrealistic to assume one can always have CERNET access even as a uni student or academia. I think for those who need to direct out scholar-!cn, it shouldn't be too hard to add one rule by themselves.

I originally proposed to split category-scholar-!cn into sth like category-scholar-oa-!cn (e.g. google scholar) and category-scholar-paid-!cn (e.g. clarivate) to enable end-users to apply different routing rules. To retain the current behavior, the newly added categories could be included in the category-scholar-!cn meta-rule so there will be no drawback at all. One needs access to paid db can simply insert a direct rule for category-scholar-paid-!cn before geolocation-!cn.

@ysakura99
Copy link
Contributor

My uni doesn't provide CERNET access to dorms and it is unrealistic to assume one can always have CERNET access even as a uni student or academia. I think for those who need to direct out scholar-!cn, it shouldn't be too hard to add one rule by themselves.

I originally proposed to split category-scholar-!cn into sth like category-scholar-oa-!cn (e.g. google scholar) and category-scholar-paid-!cn (e.g. clarivate) to enable end-users to apply different routing rules. To retain the current behavior, the newly added categories could be included in the category-scholar-!cn meta-rule so there will be no drawback at all. One needs access to paid db can simply insert a direct rule for category-scholar-paid-!cn before geolocation-!cn.

Yes that makes much more sense to me. My uni also provides payall-free access to ones like IEEE etc., but what about we just keep category-scholar-!cn as it is and split out the paid part to category-scholar-!cn-paid. I am not sure about whether we also split out category-scholar-cn-paid from category-scholar-cn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

建议移除elsevier ieee等付费数据库
5 participants